F2N-Rank: Domain Keywords Extraction Algorithm
نویسندگان
چکیده
Domain keywords extraction is very important for information extraction, information retrieval, classification, clustering, topic detection and tracking, and so on. TextRank is a common graph-based algorithm for keywords extraction. For TextRank, only edge weights are taken into account. We proposed a new text ranking formula that takes into account both edge and node weights, named F2N-Rank. Experiments show that F2N-Rank clearly outperformed both TextRank and ATF*DF. F2N-Rank has the highest average precision (78.6%), about 16% over TextRank and 29% over ATF*DF in keywords extraction of Tibetan religion.
منابع مشابه
Improved Automatic Keyword Extraction Based on TextRank Using Domain Knowledge
Keyword extraction of scientific articles is beneficial for retrieving scientific articles of a certain topic and grasping the trend of academic development. For the task of keyword extraction for Chinese scientific articles, we adopt the framework of selecting keyword candidates by Document Frequency Accessor Variety(DF-AV) and running TextRank algorithm on a phrase network. To improve domain ...
متن کاملPassage Retrieval for Information Extraction using Distant Supervision
In this paper, we propose a keyword-based passage retrieval algorithm for information extraction, trained by distant supervision. Our goal is to be able to extract attributes of people and organizations more quickly and accurately by first ranking all the potentially relevant passages according to their likelihood of containing the answer and then performing a traditional deeper, slower analysi...
متن کاملOptimizing information retrieval in question answering using syntactic annotation
One of the bottle-necks in open-domain question answering (QA) systems is the performance of the information retrieval (IR) component. In QA, IR is used to reduce the search space for answer extraction modules and therefore its performance is crucial for the success of the overall system. However, natural language questions are different to sets of keywords used in traditional IR. In this study...
متن کاملLog based Keyword Extraction and Spread based Clustering for an Efficient Information Searching
Today an efficient information search is very important to extract and analyze user requirements in vast amount of web information. Due to this reason, this paper proposes the log based keyword extraction method which finds the associated keywords in a certain domain. Also, this paper proposes the spread based clustering method as clustering the keywords with high association among the keyword-...
متن کاملSource Retrieval Based on Learning to Rank and Text Alignment Based on Plagiarism Type Recognition for Plagiarism Detection
This paper regards the query keywords selection problem in source retrieval as learning a ranking model to choose the method of keywords extraction over suspicious document segments. Four basic methods are used in our ranking function: BM25, TFIDF, TF and EW. Then, a ranking model based on Ranking SVM is proposed to rank the query keywords group which is contributed to get the higher evaluation...
متن کامل